zero-inflated models

Jo-Hannes Nowé (VLIZ)

Introduction

DTO-BioFlow DUC2

  • “Does the presence of an OWF impact European seabass distribution in the BPNS”
  • presence/absence -> information loss
  • counts per day/month

Count data

Daily detection data with active tags

Monthly detection data with active tags

Modelling count data

  • Poisson: variance \(\approx\) mean, otherwise overdispersion
  • Negative binomial: better when overdispersion

Zero inflation

Zero-inflated models

What are zero-inflated models?

  • zero-inflated vs zero-truncated models
  • source of zeros
    • structural
    • design
    • observer
    • animal
  • hurdle model vs mixture model

Two-part or hurdle model

  • zero-altered negative binomial (ZANB)
  • binomial model: probability of a presence
  • zero truncated nbinom: counts

Mixture model

  • zero-inflated negative binomial (ZINB)
  • binomial model: probability of false zero
  • nbinom model: counts + true zeros

Implementation in R

Negative Binomial

nb <- mgcv::gam(count ~ s(lon,lat, k = 10) +
                  s(lod, k = 5, bs = "cc")+
                  s(sst, k = 5)+
                  s(min_dist_owf, k = 10)+
                  s(min_dist_shipwreck, k = 10),
                offset = log(active_tags),
                data = detections_day,
                family = "nb",
                method = "REML")
library(glmmTMB)
zinb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  REML = TRUE,
  family = nbinom2,
  data = detections_day
)
zanb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  family = truncated_nbinom2,
  REML = TRUE,
  data = detections_day
)

Zero-inflated negative binomial

nb <- mgcv::gam(count ~ s(lon,lat, k = 10) +
                  s(lod, k = 5, bs = "cc")+
                  s(sst, k = 5)+
                  s(min_dist_owf, k = 10)+
                  s(min_dist_shipwreck, k = 10),
                offset = log(active_tags),
                data = detections_day,
                family = "nb",
                method = "REML")
library(glmmTMB)
zinb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  REML = TRUE,
  family = nbinom2,
  data = detections_day
)
zanb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  family = truncated_nbinom2,
  REML = TRUE,
  data = detections_day
)

Zero-altered negative binomial (hurdle)

nb <- mgcv::gam(count ~ s(lon,lat, k = 10) +
                  s(lod, k = 5, bs = "cc")+
                  s(sst, k = 5)+
                  s(min_dist_owf, k = 10)+
                  s(min_dist_shipwreck, k = 10),
                offset = log(active_tags),
                data = detections_day,
                family = "nb",
                method = "REML")
library(glmmTMB)
zinb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  REML = TRUE,
  family = nbinom2,
  data = detections_day
)
zanb <- glmmTMB(
  count ~ s(lon, lat, k = 10, bs = "tp") +
           s(lod, k = 5, bs = "cc") +
           s(sst, k = 5) +
           s(min_dist_owf, k = 10) +
           s(min_dist_shipwreck, k = 10) +
           offset(log(active_tags)),
  zi = ~ sst + min_dist_owf,
  family = truncated_nbinom2,
  REML = TRUE,
  data = detections_day
)